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ABSTRACT 

We  describe  a  new  method  consisting  of 
enzymatic  manipulation  of  genomic  DNA  or 
mRNA  with  DNA  microarrays  that  is  capable 
of  monitoring  or  profiling  any  organisms 
presented  in  a  biological  sample  without  priori 
knowledge  of  genomic  sequences.  The  method 
we  have  developed  seeks  to  “store”  all  genomic 
DNA  information  in  the  bacterial  communities 
found  in  a  patch  of  soil,  water  or  air  sample. 

The  goal  is  to  use  genomic  information  at 
the  population  or  community  scale  to  monitor 
and  detect  the  existence  of  new  biota  (such  as 
pathogens)  in  the  environment.  The  scope  of 
organisms  with  their  genomic  DNA  sequenced 
is  fairly  small.  Thus,  much  information  at  the 
genomic  level  is  not  available  with 
conventional  techniques.  In  addition,  many 
organisms  are  not  amenable  to  laboratory 
analysis.  However,  bio-agents  used  by  terrorist 
group  will  be  a  major  threat  to  our  national 
security.  The  DNA-based  memory  potentially 
provides  a  way  to  access  information  from  all 
organisms  in  a  community  (airports,  train 
terminals,... etc.)  to  assess  impact  by  human 
and  non-human  biomaterials,  does  not  require 
explicit  sequence  knowledge,  and  is  quick, 
flexible,  and  inexpensive  to  implement.  Thus,  it 
could  provide  a  holistic  view  of  the  genomic 
status  of  the  whole  environment. 

1.  INTRODUCTION 

We  propose  a  reasoning  system  based  on 
storage  and  manipulation  of  DNA  in  vitro  that 


provides  a  potentially  revolutionary  approach  to 
biological  information  processing,  and  might  be 
used  to  screen  for  human  disease.  For  example,  a 
physiological  condition  may  be  indicated  by 
expression  levels  of  many  genes  and  complex 
relationships  among  them.  The  system  should  be 
capable  of  capturing  a  snapshot  of  an  entire 
biosystem's  state  in  one  memory.  At  a  later  time, 
the  memory  can  be  updated,  or  a  new  snapshot 
acquired,  and  compared  to  a  previous  memory  to 
measure  change. 

In  addition,  snapshots  acquired  under 
different  conditions  can  be  compared  to  reason 
about  common  or  similar  mechanisms  or  effects, 
or  merged  to  form  a  combined  representation. 
The  intimate  interface  of  the  system  to  biology 
might  produce  a  more  capable,  faster,  and 
efficient  system  for  biological  information 
processing.  For  example,  instead  of  clustering 
and  analyzing  patterns  of  gene  expression  from  a 
microarray  on  a  conventional  computer,  it  would 
be  done  by  the  intelligent  DNA  Memory  in  vitro. 

1.1  Background  and  Significance 

We  present  a  DNA-based  Memory  method 
that  has  in  vitro  computational  capabilities  to 
learn  DNA  sequences  from  the  microorganisms 
to  which  it  is  exposed,  that  can  detect  ecological 
changes  from  the  genomic  information  of  all 
microorganisms,  known  or  unknown,  in  a 
sample.  The  advantages  of  the  DNA  memory  are 
a  potential  capacity  in  the  exabyte  (10  )  range, 

and  the  capability  to  match  patterns  and  classify 
data  based  on  content.  Thus,  it  could  serve  as  a 
large  database  of  heterogeneous  information  that 
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could  be  mined  for  information  based  on 
similar  content. 

With  this  technology,  intelligence  would  be 
incorporated  through  protocols  that  separate 
and  cluster  sequences  into  different  groups, 
thus,  in  a  sense,  choosing  between  different 
classification-categories,  in  vitro.  For  example, 
the  classification  might  be  harmful  pathogen  or 
not.  The  method  here  would  do  the  pattern 
recognition  and  categorization  that  is  currently 
done  in  digital  computers  for  applications  such 
as  gene  expression  studies  using  DNA  micro¬ 
arrays.  However,  unlike  digital  computers,  the 
DNA-based  memory  has  an  intimate 
connection  to  the  biology  and  by  processing  the 
information  in  vitro,  can  adapt  to  and  detect 
new  situations  without  knowing  the  sequences 
involved.  Thus,  the  method  is  an  attempt  to 
incorporate  intelligence  into  the  contents  of  test 
tube. 

Theoretical  analysis  and  experimental 
results  (see  later)  indicate  the  DNA-based 
memory  has  a  large  capacity  for  separating 
DNA  patterns,  and  has  a  fine  level  of  resolution 
among  sample  DNA.  Although  little  consensus 
exists  on  true  species  diversity,  estimates  for 
the  number  of  living  organisms  on  earth  are  on 
the  order  of  10  to  10  species,  of  which  most 
are  microbial.  Of  the  known  species,  only  113 
species  (8  eukaryotes;  89  prokaryotes;  16 
archaea)  have  their  genomes  completely 
sequenced  (Embl-ebi,  2002).  In  the  past, 
genome-enabled  studies  have  focused  on 
individual  organisms  in  specific  ecosystems.  In 
addition,  standard  techniques  for  studying 
DNA  samples  from  the  environment  require 
some  knowledge  of  the  sequence,  either  for 
PCR  primers  or  for  attachment  to  a  DNA  chip. 
By  focusing  on  single,  known  organisms, 
information  from  other  organisms,  both  known 
and  unknown,  is  lost.  Thus,  a  challenge  is  to 
discover  ways  to  tap  the  large  amounts  of 
information  from  all  organisms  in  an 
environment  and  to  use  that  information  to 
detect  deleterious  changes  to  that  environment, 
such  as  the  existence  of  pathogens. 


1.2  Associate  DNA  Memory 

The  idea  of  processing  large  amounts  of 
information  in  a  test  tube,  and  not  on  a 
conventional  solid-state  computer,  presents  the 
possibility  of  working  with  genomic  DNA  on  a 
community  or  population  scale.  The  DNA 
computer,  in  the  case  described  here,  is  a 
laboratory  protocol  that  through  DNA-to-DNA 
reactions  (hybridizations),  matches  sequence 
patterns,  thus,  stores  DNA  or  RNA  sequence 
information  in  a  DNA-based  memory,  and  then, 
matches  new  input  to  the  stored  information 
based  upon  sequence  content. 

The  storage  procedure  is  called  “learning” 
(Valiant,  1984)  because  it  acquires  information 
from  examples  (the  input  DNA),  and  does  so 
without  external  knowledge  of  the  organisms,  or 
their  genomic  sequences.  In  addition,  through  the 
“learning”  process,  the  memory  DNA  acquires 
information  from  all  organisms  in  the  input,  both 
known  and  unknown.  Moreover,  processing  the 
DNA  sequences  from  the  enrironment  in  vitro 
has  some  additional  advantages.  The  genomic 
information  is  processed  in  one  massively 
parallel  step;  likewise,  matching  of  stored 
patterns  with  new  input  can  be  done  in  parallel, 
in  one  step.  Similarity  is  implemented  by  degree 
of  annealing  between  new  input  DNAs  and  the 
stored  memory  DNA  sequences,  thus  providing  a 
technique  for  recognizing  patterns  in  different 
environmental  samples  and  detecting  change. 

As  depicted  in  Figure  1,  instead  of  encoding 
information  directly  into  individual  DNA 
sequences,  the  information  is  stored  as 
combinations  of  DNA  molecules.  In  other  words, 
instead  of  encoding  a  particular  piece  of 
information  as  one  DNA  sequence,  that 
information  is  represented  as  a  collection  of 
sequences.  For  example,  unlike  a  gene  chip,  in 
which  each  spot  represents  a  particular  cDNA 
sequence,  here,  the  input  DNA  sequence  is 
stored  as  its  constituent  subsequences.  Moreover, 
through  the  DNA  memory’s  learning  and  recall 
protocols  (see  later),  the  target  sequence  is 
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decomposed,  stored,  and  recalled  in  vitro  organisms,  and  storage  and  retrieval  of  non- 
through  its  subsequences.  biological  information. 


There  are  several  advantages  to  this 
approach.  By  matching  different  combinations 
of  subsequences,  the  memory  would  be  capable 
of  generalizing  to  new  input,  at  the  expense  of 
specificity  (Figure  1). 
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The  DNA-based  memory  is  an  application 
that  has  the  advantages  of  DNA  computing, 
mainly  the  massive  parallel  ability.  For  example, 
the  input  DNA  sequences  are  learned,  stored,  and 
recalled  in  one  massively  parallel  step.  In 
addition,  the  proposed  memory  DNA  method 
uses  some  of  DNA  computing’s  disadvantages, 
namely  the  imprecision  of  matching  (the 
hybridization  reaction),  to  its  advantage. 
Memory  recall  is  implemented  by  degree  of 
annealing  between  new  input  DNAs  and  the 
memory  sequences,  thus  providing  a  technique 
for  recognizing  patterns  based  upon  similarities 
in  content.  Thus,  in  the  current  context  of  DNA 
computing,  intelligence  can  be  defined  as  the 
ability  to  acquire  and  apply  knowledge  to  choose 
among  several  alternatives  on  a  rational  basis. 

2.  RESULTS 


More  conventional  techniques  would  access 
the  genomic  information  through  organisms 
with  known  sequences,  and  then,  rely  upon  a 
digital  computer  for  processing  the  data. 
Pattern  recognition  and  interpretation  of  large 
amounts  of  gene  expression  or  genomic  data 
are  difficult  problems  for  conventional 
computers  that  the  described  DNA-memory 
does  in  vitro. 

Finally,  DNA’s  large  storage  capacity  is 
used  to  store  genomic  information  from  a 
population  or  community  for  subsequent 
matching.  The  information  is  stored  in  a 
compact  form,  and  can  serve  as  a  database  of 
the  status  of  a  specific  environment  at  a  given 
moment  in  time.  Furthermore,  when  the  learned 
memory  DNA  of  an  environment  is  attached  to 
a  DNA  microarray,  read  out  can  be  easily 
accomplished  and  interpreted  as  either  a 
positive  or  negative  match.  In  addition,  the 
system  has  the  potential  for  other  applications, 
such  as  massive  parallel  detection  of  pathogens 
in  food  sources  or  the  environment,  tracking 
genomic  transfer  from  genetically  modified 


2.1  Learning,  Recall,  and  Reasoning 
Protocols 

A  schematic  of  the  DNA  memory  is  shown 
in  Figure  2.  The  initial  sequences  are  a  set  of  tag 
sequences,  to  which  random  sequences  are 
appended  during  synthesis.  In  principle,  the 
starting  set  of  random  probes  contains  every 
possible  sequence  of  a  given  length.  The  tag 
sequences  are  designed  to  be  independent  of 
each  other  in  that  they  will  not  hybridize  to  each 
other  (Deaton  et  al.,  2003),  and  can  be  used  for 
output  by  hybridizing  to  their  complements  on  a 
DNA  array,  or  by  biotin-striptavidin  bead 
separation.  With  simple  and  common 
recombinant  DNA  operations,  such  as  primer 
extension,  exonuclease  digestion,  and  bead 
extraction,  the  system  learns  the  DNA  sequences 
to  which  it  is  exposed  (Figure  2A). 

These  learned  sequences  can  then  be  stored 
as  a  DNA  memory  (Memory  Strands). 
Subsequently,  the  memory  strands  can  be  used  to 
recall  new  input  sequences,  or  sequences  that  are 
close  under  hybridization  affinity  (Figure  2B).  In 
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the  learning,  the  random  energy  wells  present 
in  the  hybridization  interactions  between 
memory  strand  and  input  are  deepened  by 
primer  extension  process  (Figure  2C). 


have  undergone  some  amplification  during 
polymerization.  To  learn  additional  inputs,  the 
process  is  repeated  with  another  initial  memory 
strand  with  a  different  tag  sequence. 
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Figure  2:  A  DNA-baserf  memory  with  in  vitro 
learning.  The  memory  strands  are  composed  of 
memory  specific  sequences  amr  teg  sequences- 
that  a  re  used  for  output. 


The  detailed  learning  protocol  is  shown  in 
Figure  3.  Initially,  the  memory  strands  consist 
of  a  tag  sequence  with  biotin  attached  and  short 
random  probe  sequences  (20-mers).  The  input 
DNA,  which  is  learned,  is  mixed  with  the 
initial  memory  strands  (tag  plus  probes).  The 
probes  will  hybridize  at  random  locations  on 
the  input  DNA.  After  hybridization,  a  3’  to  5’ 
exonuclease  (Exo  I)  digests  probe  and  input 
strands  from  the  3’  end  until  a  double-stranded 
region  is  encountered  (trimming).  Then,  a  5’  to 
3’  extension  by  DNA  polymerase  is  done 
(copying  input  DNA).  The  extended  memory 
strands,  tag  plus  extended  Watson-Crick 
complement  of  input,  are  separated  from  the 
input  by  the  5’  biotin  attached  to  striptavidin 
beads. 


The  products  of  the  learning  procedure  are 
single-stranded  DNAs  with  a  unique  tag 
attached  to  random  length  3’  regions  that  are 
complementary  to  the  input  DNA,  and  that 


For  recall  (Figure  4),  unknown  input  is 
exposed  to  the  different  memory  strands.  The 
input  will  hybridize  to  memory  sequences  that 
are  close  to  its  Watson-Crick  complement.  The 
specific  memory  that  is  recalled  can  be 
determined  from  the  tag  that  has  the  highest 
concentration  of  hybridized  input.  In  addition, 
sequences  complementary  to  the  tags,  or  the 
memory  strands  themselves,  can  be  attached  to  a 
solid  support  such  as  DNA  microarray  for  easy 
output. 
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Figure  3.  The  learning  protocol. 


In  Figure  4,  Ml  to  M3  are  pre-learned  DNA 
memories,  and  tl  to  t3  are  their  respective  tags. 
The  new  unknown  input  DNAs  are  mixed  with 
memory  strands,  and  hybridize  with  the  memory 
sequences  Ml -M3.  The  hybridization  product  is 
then  separated  by  serial  ssDNA  affinity  columns; 
each  of  the  columns  has  a  specific  ssDNA  with 
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the  sequence  complementary  to  the  Tag 
attached  to  the  cellulose.  The  DNA  that 
attached  to  a  specific  column  can  be  eluted  by 
denaturation.  The  new  input  DNAs  that  are 
most  similar  to  a  specific  memory  strand  can  be 
isolated. 
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to  determine  the  extent  to  which  the  random 
probes  cover  large  DNA  input  spaces. 

To  test  the  learning  protocol  shown  in  Figure 
3,  two  plasmids,  pBluescript  (Input  1;  3  kb)  and 
(j)x  174  (Input  2;  4.9  kb),  were  selected  as  the 
input  DNAs.  These  plasmids  were  chosen 
because  they  have  very  different  sequences,  and 
thus,  can  be  the  basis  for  two  unrelated 
memories.  After  digestion  with  DNA  nuclease  I, 
the  starting  sets  of  input  DNAs  are  between  200 
and  500  bases  long  (data  not  shown).  The 
starting  memory  DNAs  for  the  learning  protocol 
were  two  distinct,  non-cross  hybridizing  tag 
sequences  of  length  20  bp  that  had  20  bases  of 
random  DNA  appended  to  them,  for  a  total 
length  of  40  bp.  With  the  learning  protocol, 
Memory  1  strands  were  trained  on  pBluescript 
(input  1),  and  Memory  2  strands  on  (j)x  174 
(input  2). 

A  denaturing  gel  indicates  different 
distributions  for  the  learned  sequences  (the 
memory  strands)  for  the  two  input  DNAs,  and 
successful  extensions  of  the  initial  40-bp 
memory  strands  to  between  60  and  100  bases 
(data  not  shown).  This  indicates  that  the  learning 
protocol  is  successful  at  randomly  sampling  the 
input  space  of  DNA,  creating  a  Watson-Crick 
complement  of  the  input  DNAs,  and  polishing 
the  3’  ends. 


2.2  Validation  of  the  Learning  Protocol 

An  important  property  to  characterize  the 
DNA-based  memory  is  the  ability  of  the 
learning  and  recall  protocols  to  distinguish 
different  sets  of  DNA.  In  the  experiments 
described  here,  the  goal  was  to  test  and  verify 
the  basic  capabilities  of  the  memory  strands, 
which  include  learning  of  input  DNA 
sequences,  recall  of  the  learned  sequences, 
differentiation  of  very  different  sets  of 
sequences,  and  generalization  to  input  DNA 
that  is  close,  but  not  identical,  to  the  learned 
sequences.  In  addition,  experiments  were  done 
to  test  the  sensitivity  of  the  recall  protocol,  and 


Next,  the  capabilities  of  the  recall  procedure 
were  tested.  In  the  stained  gel  shown  at  the  left 
of  Figure  5,  the  original  plasmids,  pBluescript 
and  4>x  174  are  in  lanes  2  and  4,  respectively.  In 
lane  1  is  the  size  ladder,  that  contains  sequences 
from  pBluescript,  and  in  lane  3  is  a  plasmid  that 
shares  an  ampicillin  resistant  gene  with 
pBluescript.  As  shown  at  the  center  of  Figure  5, 
the  stained  gel  shown  at  the  left  was  blotted,  and 
the  Memory  1  DNA,  obtained  from  the  learning 
protocol,  was  radioactively  labeled,  and  used  as  a 
probe  for  the  Southern  blotting.  As  seen, 
Memory  1  hybridized  to  the  DNA  in  lanes  1  to  3, 
thus,  recalling  the  input  DNA  (pBluescript)  on 
which  it  was  trained  (lane  2),  and  two  other 
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DNAs  (lanes  1  and  3)  which  contain  some  of 
the  input  sequences,  but  not  identical,  to  the 
training  set.  In  addition,  there  was  no 
hybridization,  and  thus,  no  recall  of  the  very 
different  set  of  input  DNA  (cf)x  174)  in  lane  4. 
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Sensitivity  of  the  technique  was  also 
investigated.  Varying  amounts  of  cf)x  174  were 
added  to  a  background  of  pBluescript.  The 
DNA  memory  strand  2  (trained  on  cj)x  174)  was 
able  to  detect  target  DNA  present  in  a 
concentration  1%  of  the  background  (Figure  6). 


Furthermore,  we  have  also  measured  the 
ability  of  the  starting  random  probes  to  cover  a 
much  larger  input  space.  Thus,  the  genome  of  E. 
coli  (~5  million  base  pairs  (bp))  was  learned,  and 
adequately  recalled.  In  addition,  the  E.  coli 
genome  was  learned  with  an  additional  219  bp 
fragment  of  DNA  from  cj)x  174.  The  results  are 
shown  in  Figure  7.  After  Southern  blotting,  the 
DNA  memory  strand  was  able  to  distinguish  the 
219  bp  piece  input  from  among  the 
approximately  5  million  bp  of  the  E.  coli  genome 
(input  DNA),  showing  the  capability  for  an 
adequate  level  of  resolution. 


(AJ  (BJ  to 

i  2  a  •  t  i  i 


[■ll«lT  "  Ok  F.  i  i'.k'j  y ctin i i be  M.ui  lIi.  u: b.1  cd  id.li>  uiiuiliri  prJfififil  >u  J 

r-  *Tr.  .1  ,11.  ■  V  |  l|i.il  ^  I  '■  life  ll  i  „  l...  Ill  ■  I  ■  I  I  ■  '  I  - 

;  :c  J-JL-nkil  I  l- -Ii  .lt.I  li'.iiiii.'-.!  :i  i  iiiL'm.m  1°  KlxjJ'  hbiHipi  shirual 

ilui  rtfuiiin.  I  [I.  L'r.Ji . 1 1 •  •  1 1 l-  "  l  -  ni'i  nhlt  in  dtom^iuiib  sbe  ov  1 74 
rramniiwl.  ^KiIm  kjniiny  *iih  ilw  In^iu-nl  pn^nl  imm>  2 1  hm 
lllllh ,  Iki-  leun  mu  ■‘Jl.iII  |ii- 4i -Lr-I1-  V. u r1:  ihL1 'i1- ill  ■■(■  '.. -h  . I  I  I'l  Ib|, 

wiqiiciii.*  Irnn*  iffenstimufch  1  diiIIhw  t^i  in  t  lH'Ii  i.V'  ^1 . re  a  ■ 

'Min  nr  I  ■  >f  fun  n«H>niv  IliMA  iJinmhJ  M*lh  I  tUmul  I  Ini  t-  3k  mil  n 
Jfc'J  tqi UKA  Impmcpl  |jmn 3^4 i l:nxr  ^ i  l..uirl  ii  lire- in*loi  )H| 
bjuArtb  ItfOC  mI  iid  SJliM  ii  I b  1 1\ i  pf.itaJtvi  Ml  "LiK.h  bn  I  bkIi  lliiiaL\l 
Mill.*  ,  .m'.-m.ti, .  |  )S.\  • -.|MI  I  L'  I  . . .  «mI  i'll  Jirnr.  ,i 

i  A  i  pi.if^ii  In.  V  n.hii.1.  Li  -  Iumn  Irani  tid  I '.in  i  v.'Ji  v  ■  1 1 1  1 1 1 1 .  i  1 ' 

ii”j  -  iBc  Z  I'l  h|.  '4".  I  •N.',  D ■  nil  nr 


0  X  174 


pBiuescnpl 


174  t  k  174 

II  J4  !  I  T  I  U  U  M  M  7  I  P  V 


fpr-sbffiS  wf  M2; 


I'lfljire-C  |  h'.1 1'.'M  Mhd  i-.  ..i  '.  ,ij.jmk  ji''l  ^uiriivl  n  i A  ^Ihiihiat 

l-i  .Mi.1.1.  I  .mi.-  I  L-.iilinn.  I  1  I'l.i^ni.iii-.- h.irn  |  INI  li|i- 1,-.  NIHH  hpj  nf 
„  -i;i  1 1 1 1  pl.ixiinJ  4  1  in.'  !  kh  i -.liUL'-ilrJ  i1. 1 1 1 1  r-rr-il  1 1  ■.  1 1  ■.  a  li  i  •.  i  ii  l: 

1 1|- j  II  I  jiiu  2  LH^bJiih  5  hunUunlb  1 1 1 ■  1 1 1  Z,||!  ^>li'-I  7  jiiJ  Zkt'.i  ill 
It  J  ffcMimil  1 1  hl.  J  kbi  Jia±«kil  ¥i  ia  I  Ipi  1.1 .  Rich  of  Ijuim  3 
o.'  \0.  toalAim  ,i  liuJ  Hi'MniMl  p T: 1 1 ■  l r^ripi  i  I  ii^i  viilh  in^rmMr^: 
;imi>nl  ■ ' T I  T-l  <  I  n  ngi.  Jn  ii|i.  5ii  ii|i.  I  imi  nji.  2ihi  n|i.  *ki  r^.  An 

nu  :iml  H4Hl  nu  i  The  null!!  p.iiit-1  ix  [he  t-eiini:  uvl  huun  hlnlfirJ  ti 
iiiCii  -L-d^ik'bf  niciiilirHii:  mid,  | ' u 4 * l_- J  v.  illi  FfIltii-.ii'i  :^i:iikI  2  lli:il  li:ii 
biMII  Wuinnl  u  dh  h.  I  '4  .1 .  Mil-  mi|miI  I  3NA  ll  'lUiV'  :■  M  il  ■J'.tll  ■■  ilh 
onh  I  ■  --  '.d'  +  1 7+  ■'  L*>  in  plilu^nf^duni.  ilmdiill  be 

Jvln.:kJ 


2.3  Applications  of  the  DNA  memory  to  detect 
pathogens  in  environments  and  medical 
screening  from  gene  expression. 

Two  types  of  DNA  microarrays,  cDNA  type 
arrays  or  oligonucleotide  arrays,  can  be  used  as 
an  output  device  for  the  DNA  memory.  As 
illustrated  in  Figure  8,  each  memory  strand 
learned  from  the  digested  genomic  DNA  of  a 
specific  microorganism  would  be  represented  as 
a  single  spot  in  the  reference  micro-array.  The 
references  micro-arrays  would  consist  of  spots 
corresponding  to  memories  of  many  different 
species  of  known  microorganisms.  Differences 
in  the  hybridization  patterns  probed  by  the  new 
memory  strands  on  the  references  would  indicate 
changes  of  the  composition  of  the 
microorganisms  in  the  ecosystems.  The  reference 
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chips  will  be  made  according  to  the  standard 
methods  of  DNA  microarray  technology  (11). 


Various,  input  DNA  under  -study 
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sequences  can  be  synthesized  directly  onto  the 
arrays  (Affymatrix,  Santa  Clara,  CA).  These  sets 
will  be  designed  to  cover  the  hybridization  space 
using  the  tools  that  we  have  developed  (Deaton 
et  al.,  2003a).  Synthesized  on  the  oligonucleotide 
arrays  are  known  sequences  that  are  subsets  of 
all  possible  sequences  of  a  given  length,  for 
example  all  the  possible  10-mers,  and  will  be 
designed  (Deaton  et  al.,  2003  a,  b)  to  have 
specific  hybridization  properties.  Either  memory 
DNA  strands  obtained  through  learning  from  a 
specific  input  DNA  or  input  DNA  itself  can  be 
hybridized  on  these  arrays,  producing  a  signature 
identifying  the  input  condition  and  a  spectrum  of 
the  sequence  composition  of  the  input. 
Comparison  of  these  results  will  help 
characterize  the  DNA  memory  with  reasoning. 
Potentially,  this  might  detect  minor  differences 
of  two  closely  related  input  DNAs.  For  example, 
gene  expression  patterns  of  normal  cells  vs 
cancer  cells. 

2.4  Comparison  with  Current  Technology 


Ki*urv  h.  1  iPrtfPW  \liiTiiuirfni  In  l&rlrvl  I  Siun^vt. 

The  basic  idea  is  the  whole  patterns  of  spots 
on  the  arrays  represent  a  hybridization 
signature  for  a  particular  input  DNA  sample 
taken  from  a  specific  environment.  For  these 
arrays,  each  spot  on  the  array  is  whole  memory 
strands,  and  thus,  any  particular  spot  represents 
the  condition  under  which  that  memory  was 
formed.  Thus,  with  many  spots  on  the  array,  the 
compression  of  information  in  the  memories 
enables  an  array  to  represent  higher-level 
relationships.  For  instance,  the  different 
patterns,  commonalities  and  similarities  among 
many  ecosystems  can  be  obtained  by  probing 
each  of  many  reference  chips  with  a  specific 
memory  strands  learned  from  a  specific 
ecosystem. 

In  addition,  oligonucleotide  arrays  (Chee  et 
al.,  1996)  with  specific  oligonucleotide  DNA 
synthesized  on  the  chip  can  also  be  used  as  an 
output  device.  With  this  type  of  chip,  instead  of 
DNA  memory  strands,  sets  of  designed 


The  essential  problem  that  the  proposed 
techniques  try  to  solve  is  the  production  of  a 
signature  or  fingerprint  to  identify  an  organism 
or  a  physiological  condition  from  the 
composition  of  genomic  material,  either  genomic 
DNA  or  mRNA,  including  those  sequences 
present  in  low  abundance  (Fievens  et  al.,  2001). 
The  DNA  memory  uses  techniques  similar  to 
many  current  protocols  to  implement  long-term 
storage  with  reasoning  capabilities;  however,  its 
application  is  very  different  from  the  specific 
purposes  of  the  current  techniques;  mainly  for 
identification  of  individual  organisms  or 
differentially  expressed  genes,  and  thus,  are 
usually  focused  on  specific  answers  to  one 
problem  at  a  time.  Our  goals  are  not  the 
identification  or  quantification  of  individual 
organisms  or  differentially  expressed  genes,  but 
the  identification  and  quantification  of 
populations  or  physiological  conditions.  Thus, 
our  DNA  memory  takes  a  higher  level  approach, 
and  as  a  consequence,  provides  different 
information  than  current  conventional 
techniques. 
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CONCLUSION 

In  sum,  the  proposed  DNA  memory  seeks 
to  answer  whether  the  population  of  microbes 
present  in  an  environment  has  changed  over 
time  and  whether  the  population  might  contain 
any  dangerous  species,  not  what  those  specific 
species  are.  Likewise,  it  would  answer  whether 
the  physiological  condition  of  an  organism  has 
change  and  does  it  match  a  set  of  previously 
stored  conditions,  not  the  specific  gene  whose 
expression  level  has  changed. 


Therefore,  the  intelligent  DNA  memory 
attempts  to  capture  global  information  about  a 
population  of  organisms,  or  whole  genome  gene 
expressions  under  certain  conditions,  and 
recognize  patterns  of  contrast  and  commonality 
at  that  higher  level.  For  example,  in  gene 
expression,  the  learning  protocol  captures  the 
entire  population  of  genes  and  their  levels  of 
expression,  not  just  those  that  are  differentially 
expressed.  Moreover,  the  DNA  memory 
incorporates  intelligent  processing  and  reasoning 
capabilities  into  the  test  tube.  It  is  not  simply  a 
lab  technique  to  gather  data  for  analysis  in  an 
electronic  computer,  but  uses  the  massive  scale 
of  storage  and  parallelism  of  DNA  as  the 
computational  tool  to  draw  inferences  on  the 
entire  in  vitro  knowledge  base  quickly  and 
efficiently. 

This  means  that  the  DNA  memory  can 
reason  and  extract  knowledge  in  situations  that 
involve  new  or  unknown  information,  which 
conventional  lab  techniques  and  electronic 
computers  cannot  do.  Examples  of  this  include 
populations  of  microorganisms  with  unknown, 
unsequenced,  or  mutant  organisms  and 
physiological  conditions  with  complicated  or 
unknown  patterns  of  gene  expression. 

Thus,  because  of  its  simplicity  of 
implementation,  the  proposed  work  would  make 
large-scale  DNA-based  associative  memories  a 
reality  in  the  near-term,  as  well  as  providing  a 
convenient  mechanism  for  applications  in 
biosensing  and  gene  expression.  In  the  future,  it 
is  also  possible  to  apply  this  technique  to  non- 
biological  data,  the  advantage  of  a  DNA  memory 
are  massive  scale  and  storage  density  with 
potentially  exabyte  (10  )  amounts  of 
information  in  a  gram  of  DNA,  the  massive 
parallelism  of  the  search  and  reasoning  protocol, 
which  could  supply  substantial  speed-ups,  and 
the  capability  to  search  data  based  upon  context 
and  content,  thus  providing  a  semantic 
component  to  the  memory. 
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